Content tagging

ABSTRACT

Systems, methods, devices, media, and computer readable instructions are described for local image tagging in a resource constrained environment. One embodiment involves processing image data using a deep convolutional neural network (DCNN) comprising at least a first subgraph and a second subgraph, the first subgraph comprising at least a first layer and a second layer, processing, the image data using at least the first layer of the first subgraph to generate first intermediate output data; processing, by the mobile device, the first intermediate output data using at least the second layer of the first subgraph to generate first subgraph output data, and in response to a determination that each layer reliant on the first intermediate data have completed processing, deleting the first intermediate data from the mobile device. Additional embodiments involve convolving entire pixel resolutions of the image data against kernels in different layers if the DCNN.

CLAIM OF PRIORITY

This application is a continuation of and claims the benefit of priorityof U.S. patent application Ser. No. 16/192,419, filed on Nov. 15, 2018,which is a continuation of and claims the benefit of priority of U.S.patent application Ser. No. 15/247,697, filed on Aug. 25, 2016, whichclaims the benefit of priority of U.S. Provisional Application Ser. No.62/218,965, filed on Sep. 15, 2015 and U.S. Provisional Application Ser.No. 62/358,461, filed on Jul. 5, 2016, each of which are incorporatedherein by reference in their entireties.

TECHNICAL FIELD

Embodiments of the present disclosure relate generally to resourcelimited computing systems and image processing to tag or label contentusing such resource limited computing systems. Some embodimentsparticularly related to deep convolutional neural networks used forimage tagging.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 is a block diagram showing an example messaging system forexchanging data (e.g., messages and associated content) over a networkin accordance with some embodiments.

FIG. 2 is block diagram illustrating further details regarding amessaging system, according to example embodiments.

FIG. 3 is a schematic diagram illustrating data which may be stored inthe database of the messaging server system, according to certainexample embodiments.

FIG. 4 is a schematic diagram illustrating a structure of a message,according to some embodiments, generated by a messaging clientapplication for communication.

FIG. 5 illustrates aspects of systems and devices for image tagging andlocal visual search in accordance with some embodiments.

FIG. 6 illustrates aspects of a device for image tagging and visualsearch with some embodiments.

FIG. 7 illustrates aspects of a system for image tagging and visualsearch according to some embodiments.

FIG. 8 illustrates aspects of DCNN in accordance with variousembodiments described herein.

FIG. 9 illustrates aspects of operations for image processing and visualsearch in accordance with some embodiments.

FIG. 10 illustrates aspects of operations for image processing andvisual search in accordance with embodiments described herein.

FIG. 11 illustrates aspects of a method for image tagging and visualsearch in accordance with embodiments described herein.

FIG. 12 is a block diagram illustrating a representative softwarearchitecture, which may be used in conjunction with various hardwarearchitectures herein described and used to implement variousembodiments.

FIG. 13 is a block diagram illustrating components of a machine,according to some example embodiments, able to read instructions from amachine-readable medium (e.g., a machine-readable storage medium) andperform any one or more of the methodologies discussed herein.

DETAILED DESCRIPTION

Embodiments described herein relate to resource limited computingsystems and image processing to tag or label content for visual searchon such resource limited computing systems. Some embodimentsparticularly related to deep convolutional neural networks used forimage tagging to enable visual search. Some embodiments operate in anephemeral messaging system with an integrated content storage system foroptional non-ephemeral storage of content.

Visual search refers to systems that allow users to input text in orderto identify images associated with the text. For example, a text inputof “beach” would result in a visual search result of images including abeach in at least a portion of the images. Traditional operations forenabling such visual search include significant computing resources toanalyze images to generate “tags” or text metadata associated with animage based on object recognition or other analysis tools to identifythe content of the image. Because of such resource demands, networkenabled mobile devices (e.g. smartphones or tablets) typically transfersome or all of the image processing to networked cloud computingresources. Such cloud based techniques, however, have multipledrawbacks. These include difficulty scaling and costs for computationaland memory resources, particularly in an environment serving millions ofusers. Additionally, network resources to transfer images are alsocostly. Further still, transferring images over a network involvesprivacy concerns, where users may prefer not to generate and storecopies of their images in a cloud environment.

Embodiments described herein provide technical solutions to thetechnical resource limitation problems presented above with particularprocessor implemented object recognition operations. The embodimentsdescribed herein allow the mobile device to process and tag images onthe mobile device under resource constrained conditions in a way notpossible with previously known systems. In addition to enabling objectrecognition on mobile devices, additional benefits are provided bymaking content easily searchable in environments without network access,and by providing security and privacy by allowing search without makingcontent accessible to a network. Thus, as described below, theembodiments improve the operation of mobile devices by enabling localtagging and visual search. Particular embodiments provide thisimprovement using deep convolutional neural networks (DCNN), knowledgegraph(s), natural language processing, content metadata, or variouscombinations of the above.

“Content”, as described herein, refers to one or more images or videoclips (e.g. snaps) captured by an electronic device, as well as anyassociated metadata descriptions and graphics or animation added to theimage or video clip. This includes metadata generated by an electronicdevice capturing an image or video, as well as metadata that may beassociated later by other devices. A “piece of content” refers to anindividual image or video clip captured by a client device with anychanges made to the image or video clip (e.g. transformations, filters,added text, etc.). Individual pieces of content may have multimediaelements, including drawings, text, animations, emoji, or other suchelements added along with image or video clip elements. Content capturedby an image sensor of a client device may be sent, along with any addedmultimedia elements from a user, via a network to other client devicesas part of a social sharing network. Individual pieces of content mayhave time limits or associated display times, which are within a displaythreshold set by a system. For example, an embodiment system may limitvideo clips to 10 seconds or less, and may allow users to select displaytimes less than 10 seconds for image content.

A “content message” as referred to herein refers to the communication ofcontent between one or more users via the system. Content may also besent from a client device to a server system to be shared generally withother system users. Some embodiments limit content messages to images orvideo clips captured using an interface that does not allow the contentto be stored and sent later, but instead uses an associated contentmessage with a single piece of content and any added multimedia to besent before any other action is taken on the device. Embodimentsdescribed herein relate to methods of grouping such content into contentcollections (e.g., stories.) In various systems, content messages may besent from one individual user to another individual user, as, forexample, an ephemeral message in addition to the ability to send contentmessages to a server computer system for inclusion in various contentcollections.

A “content collection” as described herein is an ordered set of content(e.g. a story). The individual pieces of content that make up aparticular content collection may be related in a variety of differentways. For example, in some embodiments, a content collection includesall pieces of content marked as public that are sent to a server systemfrom a particular user within a certain time frame (e.g., within thepast 24 hours). Access to such a content collection can be limited tocertain other users (e.g., friends) identified by the user thatgenerates the content for the collection. In some other embodiments,content collections include pieces of content from different users thatare related by time, location, content, or other metadata. In someembodiments, content collections are referred to as stories. A story orcontent collection may be generated from pieces of content that arerelated in a variety of different ways, as is described in more detailthroughout this document.

“Ephemeral” content refers to content with an associated trigger fordeletion. In some embodiments, for example, a content capture interface(e.g. such as the interface illustrated in FIG. 8 ) enables the captureof images or video clips. An ephemeral messaging interface allows forthe captured content to be communicated to other users as part of anephemeral message that will be deleted after viewing, or to be sent to a“live” content collection that is available for viewing to by otheraccounts in the system for a limited period of time. The ephemeralcontent is not stored on the device that captured the content unless auser elects to store the content. The application for the ephemeralmessaging system thus defaults to deletion of the content unless theuser elects to store the content in local application storage.

Thus, as described herein, some embodiments include the operations of anephemeral messaging application operating on a client device (e.g. asmartphone) as part of an ephemeral messaging system that includes anoperation for non-ephemeral storage of content. Various embodimentstherefore include local tagging and search on a device for imagescaptured by the device, as well as similar tagging and search fornon-ephemeral images received through a messaging application.

FIG. 1 is a block diagram showing an example messaging system 100 forexchanging data (e.g., messages and associated content) over a network.The messaging system 100 includes multiple client devices 102, each ofwhich hosts a number of applications including an ephemeral messagingclient application 104. Each messaging client application 104 iscommunicatively coupled to other instances of the messaging clientapplication 104 and a messaging server system 108 via a network 106(e.g., the Internet).

Ephemeral messaging client application 104 operates as described belowto enable communication of ephemeral content messages between devicesassociated with user accounts. In embodiments described herein,ephemeral messaging client application 104 also enables indexing andstorage of content initially generated for ephemeral messaging in both anon-private format that is synchronized with content storage 124 on aserver, as well as private content that may or may not be synchronizedwith content storage 124 depending on the level of security selected bya user of a client device 102.

In addition to content stored on a client device 102 using application104, content may also be stored using an application directly associatedwith the camera on the device (e.g. camera roll storage) that isseparate from the storage for application 104.

As part of operation of an ephemeral messaging client application 104,various tabs or user interface screens are available. One such screen isa content collection (e.g. stories) screen that includes icons forvarious stories, including a live story (e.g. my story) for a useraccount of the specific client device; stories for content sources suchas magazines, newspapers, television networks, etc.; stories forlocations such as particular cities, universities, etc.; and stories forrelated accounts that are available to the account associated with thedevice displaying the interface (e.g. stories from friends).

Additionally, a user interface for content within application 104includes a display for content stored within application 104. This mayinclude a tab for all content, a separate tab for individual images andvideos, and a tab for content collection. In some embodiments, such auser interface may present an individual presentation for a story whichrepresents a number of images and/or videos. One embodiment uses acircular shape with at least a portion of an image from a piece ofcontent displayed within the circle. The piece of content representedwithin the circle may change over time, such that each piece of contentassociated with a content collection is presented for a period of time.In one embodiment, part of an image from each piece of content ispresented for one second, cycling through each piece of content in orderand returning to the beginning after each piece of content in the storyhas been represented. For example, in a content collection with 15pieces of content including images and videos, each piece of content hasan associated image portion displayed within the circle for 1 second,with the pattern repeating after 15 seconds. On a device with multiplecontent collections, multiple such circles with rotating internal imagecontent are displayed simultaneously within the interface on a device102. In an “all” display, pieces of content may be displayed withinrectangular areas next to circular areas for stories. This may involvevarious patterns of circles and squares as described herein.

As individual pieces of content are stored within the applicationstorage, the user interface allows these individual pieces of content tobe accessed within application 104. This may involve automatic orderingbased on a generation time for each piece of content, or any userselected ordering. Individual pieces of content and content collectionscan then be presented together or separately in the user interface ofapplication 104. As described below, image tagging operations may beperformed by the mobile device on such content to enable visualsearching of content collections or of any content within theapplication storage using the various embodiments described herein. Thisimaging tagging is performed on the mobile device without transferringimages to networked resources to assist with the image processing.

Additionally, in some embodiments, the separate camera roll storage maybe accessed or presented within application 104. In some embodiments,application 104 accesses separate storage of client device 102 toidentify content, and presents the content within the camera rollstorage within a portion of application 104 interface. In someembodiments, this camera roll storage is not synchronized with servercontent 124. Content that is moved or copied from camera roll storage toapplication storage is synchronized and backed up to content storage124. If content from camera roll storage is used for a contentcollection or is marked private within application 104, the content isautomatically moved to application storage. Depending on privacysettings, the content may be encrypted and/or deleted from camera rollstorage after it is placed in application 104 storage. Such content maybe automatically processed to generate tags and to make the contentavailable for visual search within the application as it is moved fromcamera roll storage to application storage.

When content is generated using the content capture interface ofapplication 104, metadata is stored with the content, such as capturetime, capture location, available filters/overlays, temperature, devicespeed, user added drawings, or other such system based metadata. In someembodiments, this information and the availability of certain overlayfilters is stored, and the content is editable within applicationstorage based on this metadata. For example, if application 104 allowsephemeral content to be edited to add an overlay filter for a locationonly when the device is within the location, then in some embodiments,this limited location overlay filter is still available for the contentcaptured in that location even after the device moves to a differentlocation. The content stored in application storage may then later becommunicated using a chat or messaging interface using special filtersbased on metadata. The ability to edit content from application storageto add overlays or drawings and then communicate the content appliesboth to individual pieces of content as well as entire contentcollections. Such metadata may be used with image tags to enhance visualsearch as described below.

For content stored in camera roll storage, some metadata may be accessedby the application 104 and similarly used to apply some filters to thecontent. Application 104 allows temporary editing and drawing on top ofcontent from camera roll storage. In some embodiments, a user mayselectively overwrite the camera roll file with edits. In otherembodiments, the application 104 does not overwrite camera roll contentwith versions of the content including filters or overlay drawings, butmakes a local copy in application storage. If the user does not storecopies of camera roll content with added overlays within applicationstorage or camera roll storage, the additions are lost when the usercloses application 104 or navigates away from the camera roll contentwith added overlays.

As described below, various metadata and object recognition processingis applied to content in some embodiments. This enables searching andsorting of content within application storage. Text searching can beused to identify content based on metadata such as location name,content of images, context information from images, or any other suchinformation (e.g. “Ho” may identify images associated with Houston Texasand “home” as identified by metadata and object recognition data.) Insome embodiments, this searching is used to present one or moreinterfaces with dynamically sorted content. One example embodimentincludes user interfaces for content that was generated near to acurrent location of the device presenting the interface. Another exampleembodiment includes a “flashback” interface that displays contentassociated with a related day or time (e.g. the same day or week of aprevious year or a previous instance of a current event that isassociated with a current time/location of the client device 102.)Similarly, more complex search and sorting associations can begenerated. For example, certain times (e.g. days of the week) can beassociated with presenting certain image content. In one embodiments,Friday afternoons can be associated with images of the beach, so that asearch category of beach images is presented to a user only on Fridaysafter noon, or the search category is ranked and presented higher onFridays after noon. In some embodiments, a server controlled system isused to selected search categories based on server selected criteria.For example, public news trends can be used to select certain searchcriteria which is then used to present certain groupings of localcontent from application storage. In one example, an Olympic victory ina certain event for a certain country may initiate a server side commandfor devices within the associated country to prioritize local searchgroups associated with that event. As described herein, however, whilesearch categories may be managed by a server system, the image taggingused to generate local results is performed on the mobile device.

In some embodiments, application storage is divided into non-privatecontent and private content. In such embodiments, non-private content isnot encrypted and is synchronized with content storage 124 on a serverand presented within user interfaces of application 104 with nosecurity. Application storage content marked as private can be managedin different ways. In some embodiments, such content is encrypted, andis only displayed within the user interfaces of application 104 after apassword is entered into application 104. In one embodiment, such apassword is a 4-digit numerical personal identification number (PIN). Inanother embodiment, a 16 character passphrase is used and used forgreater encryption. In some such embodiments, a user is notified thatthe server system does not store passphrases and the user will lose allaccess to private content is a passphrase is forgotten. In someembodiments, private content is encrypted at the client device, withencrypted copies of the content synchronized to the content storage 124within a server system, such that the content can be accessed at otherdevices associated with a shared account using the PIN or passphrase. Inother embodiments, private content is stored locally with no copy storedwithin server-based content storage 124. In some embodiments, privatecontent is shareable via messaging or chat interfaces after a PIN orpassphrase is entered. In some embodiments, private content is notshareable.

Application 104 operates, as described herein, as part of an ephemeralmessaging system. Accordingly, each messaging client application 104 isable to communicate and exchange data with another messaging clientapplication 104 and with the messaging server system 108 via the network106. The data exchanged between messaging client applications 104, andbetween a messaging client application 104 and the messaging serversystem 108, includes functions (e.g., commands to invoke functions) aswell as payload data (e.g., text, audio, video or other multimediadata).

The messaging server system 108 provides server-side functionality viathe network 106 to a particular messaging client application 104. Whilecertain functions of the messaging system 100 are described herein asbeing performed by either a messaging client application 104 or by themessaging server system 108, it will be appreciated that the location ofcertain functionality either within the messaging client application 104or the messaging server system 108 is a design choice. Visual taggingoperations as described for various embodiments below are, however,performed on the mobile device for reasons described above.

The messaging server system 108 supports various services and operationsthat are provided to the messaging client application 104. Suchoperations include transmitting data to, receiving data from, andprocessing data generated by the messaging client application 104. Insome embodiments, this data includes message content, client deviceinformation, geolocation information, media annotation and overlays,message content persistence conditions, social network information, andlive event information, as examples. In other embodiments, other data isused. Data exchanges within the messaging system 100 are invoked andcontrolled through functions available via user interfaces (UIs) of themessaging client application 104.

Turning now specifically to the messaging server system 108, anApplication Program Interface (API) server 110 is coupled to, andprovides a programmatic interface to, an application server 112. Theapplication server 112 is communicatively coupled to a databaseserver(s) 118, which facilitates access to a database(s) 120 in which isstored data associated with messages processed by the application server112.

Dealing specifically with the Application Program Interface (API) server110, this server 110 receives and transmits message data (e.g., commandsand message payloads) between the client device 102 and the applicationserver 112. Specifically, the Application Program Interface (API) server110 provides a set of interfaces (e.g., routines and protocols) that canbe called or queried by the messaging client application 104 in order toinvoke functionality of the application server 112. The ApplicationProgram Interface (API) server 110 exposes various functions supportedby the application server 112, including account registration; loginfunctionality; the sending of messages via the application server 112from a particular messaging client application 104 to another messagingclient application 104; the sending of media files (e.g., images orvideo) from a messaging client application 104 to the messaging serverapplication 114, and for possible access by another messaging clientapplication 104; the setting of a collection of media data (e.g.,story); the retrieval of a list of friends of a user of a client device102; the retrieval of such collections; the retrieval of messages andcontent; the adding and deletion of friends to a social graph; thelocation of friends within a social graph; opening an application event(e.g., relating to the messaging client application 104).

The application server 112 hosts a number of applications andsubsystems, including a messaging server application 114, an imageprocessing system 116, a social network system 122, and content storage124. The messaging server application 114 implements a number of messageprocessing technologies and functions, particularly related to theaggregation and other processing of content (e.g., textual andmultimedia content) included in messages received from multipleinstances of the messaging client application 104. As will be describedin further detail, the text and media content from multiple sources maybe aggregated into collections of content (e.g., called stories orgalleries). These collections are then made available, by the messagingserver application 114, to the messaging client application 104. Otherprocessor and memory intensive processing of data may also be performedserver-side by the messaging server application 114, in view of thehardware requirements for such processing.

The application server 112 also includes an image processing system 116that is dedicated to performing various image processing operations,typically with respect to images or video received within the payload ofa message at the messaging server application 114.

The social network system 122 supports various social networkingfunctions services, and makes these functions and services available tothe messaging server application 114. To this end, the social networksystem 122 maintains and accesses an entity graph 304 (shown in FIG. 3 )within the database(s) 120. Examples of functions and services supportedby the social network system 122 include the identification of otherusers of the messaging system 100 with which a particular user hasrelationships or is “following,” and also the identification of otherentities and interests of a particular user.

Content storage 124 interacts with local storage from client devices tosynchronize non-ephemeral non-private content between multiple devicesassociated with a single user account, and to manage any communicationof such content to devices of other accounts as part of a communicationfrom one account to another account.

The application server 112 is communicatively coupled to a databaseserver(s) 118, which facilitates access to a database(s) 120 in which isstored data associated with messages processed by the messaging serverapplication 114.

FIG. 2 is block diagram illustrating further details regarding themessaging system 100, according to example embodiments. Specifically,the messaging system 100 is shown to comprise the messaging clientapplication 104 and the application server 112, which in turn embody anumber of some subsystems, namely an ephemeral timer system 202, acollection management system 204 and an annotation system 206.

The ephemeral timer system 202 is responsible for enforcing thetemporary access to content permitted by the messaging clientapplication 104 and the messaging server application 114. To this end,the ephemeral timer system 202 incorporates a number of timers that,based on duration and display parameters associated with a message, orcollection of messages (e.g., a SNAPCHAT story), selectively display andenable access to messages and associated content via the messagingclient application 104. Further details regarding the operation of theephemeral timer system 202 are provided below.

The collection management system 204 is responsible for managingcollections of media (e.g., collections of text, image video and audiodata). In some examples, a collection of content (e.g., messages,including images, video, text and audio) may be organized into an “eventgallery” or an “event story.” Such a collection may be made availablefor a specified time period, such as the duration of an event to whichthe content relates. For example, content relating to a music concertmay be made available as a “story” for the duration of that musicconcert. The collection management system 204 may also be responsiblefor publishing an icon that provides notification of the existence of aparticular collection to the user interface of the messaging clientapplication 104.

The annotation system 206 provides various functions that enable a userto annotate or otherwise modify or edit media content associated with amessage. For example, the annotation system 206 provides functionsrelated to the generation and publishing of media overlays for messagesprocessed by the messaging system 100. The annotation system 206operatively supplies a media overlay (e.g., a SNAPCHAT filter) to themessaging client application 104 based on a geolocation of the clientdevice 102. In another example, the annotation system 206 operativelysupplies a media overlay to the messaging client application 104 basedon other information, such as social network information of the user ofthe client device 102. A media overlay may include audio and visualcontent and visual effects. Examples of audio and visual content includepictures, texts, logos, animations, and sound effects. An example of avisual effect includes color overlaying. The audio and visual content orthe visual effects can be applied to a media content item (e.g., aphoto) at the client device 102. For example, the media overlay includestext that can be overlaid on top of a photograph generated taken by theclient device 102. In another example, the media overlay includes anidentification of a location overlay (e.g., Venice beach), a name of alive event, or a name of a merchant overlay (e.g., Beach Coffee House).In another example, the annotation system 206 uses the geolocation ofthe client device 102 to identify a media overlay that includes the nameof a merchant at the geolocation of the client device 102. The mediaoverlay may include other indicia associated with the merchant. Themedia overlays may be stored in the database(s) 20 and accessed throughthe database server(s) 118.

In one example embodiment, the annotation system 206 provides auser-based publication platform that enables users to select ageolocation on a map, and upload content associated with the selectedgeolocation. The user may also specify circumstances under which aparticular media overlay should be offered to other users. Theannotation system 206 generates a media overlay that includes theuploaded content and associates the uploaded content with the selectedgeolocation.

In another example embodiment, the annotation system 206 provides amerchant-based publication platform that enables merchants to select aparticular media overlay associated with a geolocation via a biddingprocess. For example, the annotation system 206 associates the mediaoverlay of a highest bidding merchant with a corresponding geolocationfor a predefined amount of time

In various embodiments, visual search operations described herein mayprocess both copies of images without added annotations, copies ofimages including annotations, or both depending on user selections andsystem settings.

FIG. 3 is a schematic diagram illustrating data 300 which may be storedin the database(s) 120 of the messaging server system 108, according tocertain example embodiments. While the content of the database(s) 120 isshown to comprise a number of tables, it will be appreciated that thedata could be stored in other types of data structures (e.g., as anobject-oriented database).

The database(s) 120 includes message data stored within a message table314. The entity table 302 stores entity data, including an entity graph304. Entities for which records are maintained within the entity table302 may include individuals, corporate entities, organizations, objects,places, events, etc. Regardless of type, any entity regarding which themessaging server system 108 stores data may be a recognized entity. Eachentity is provided with a unique identifier, as well as an entity typeidentifier (not shown).

The entity graph 304 furthermore stores information regardingrelationships and associations between entities. Such relationships maybe social, professional (e.g., work at a common corporation ororganization), interest-based or activity-based, merely for example.

The database(s) 120 also stores annotation data, in the example form offilters, in an annotation table 312. Filters for which data is storedwithin the annotation table 312 are associated with and applied tovideos (for which data is stored in a video table 310) and/or images(for which data is stored in an image table 308). Filters, in oneexample, are overlays that are displayed as overlaid on an image orvideo during presentation to a recipient user. Filters may be of varioustypes, including user-selected filters from a gallery of filterspresented to a sending user by the messaging client application 104 whenthe sending user is composing a message. Other types of filers includegeolocation filters (also known as geo-filters) which may be presentedto a sending user based on geographic location. For example, geolocationfilters specific to a neighborhood or special location may be presentedwithin a user interface by the messaging client application 104, basedon geolocation information determined by a GPS unit of the client device102. Another type of filer is a data filer, which may be selectivelypresented to a sending user by the messaging client application 104,based on other inputs or information gathered by the client device 102during the message creation process. Example of data filters includecurrent temperature at a specific location, a current speed at which asending user is traveling, battery life for a client device 102 or thecurrent time.

Other annotation data that may be stored within the image table 308 isso-called “lens” data. A “lens” may be a real-time special effect andsound that may be added to an image or a video.

As mentioned above, the video table 310 stores video data which, in oneembodiment, is associated with messages for which records are maintainedwithin the message table 314. Similarly, the image table 308 storesimage data associated with messages for which message data is stored inthe entity table 302. The entity table 302 may associate variousannotations from the annotation table 312 with various images and videosstored in the image table 308 and the video table 310.

A story table 306 stores data regarding collections of messages andassociated image, video, or audio data, which are compiled into acollection (e.g., a SNAPCHAT story or a gallery). The creation of aparticular collection may be initiated by a particular user (e.g., eachuser for which a record is maintained in the entity table 302). A usermay create a “personal story” in the form of a collection of contentthat has been created and sent/broadcast by that user. To this end, theuser interface of the messaging client application 104 may include anicon that is user selectable to enable a sending user to add specificcontent to his or her personal story.

A collection may also constitute a “live story,” which is a collectionof content from multiple users that is created manually, automatically,or using a combination of manual and automatic techniques. For example,a “live story” may constitute a curated stream of user-submitted contentfrom various locations and events. Users whose client devices 102 havelocation services enabled and are at a common location event at aparticular time may, for example, be presented with an option, via auser interface of the messaging client application 104, to contributecontent to a particular live story. The live story may be identified tothe user by the messaging client application 104, based on his or herlocation. The end result is a “live story” told from a communityperspective.

A further type of content collection is known as a “location story”,which enables a user whose client device 102 is located within aspecific geographic location (e.g., on a college or university campus)to contribute to a particular collection. In some embodiments, acontribution to a location story may require a second degree ofauthentication to verify that the end user belongs to a specificorganization or other entity (e.g., is a student on the universitycampus).

FIG. 4 is a schematic diagram illustrating a structure of a message 400,according to some in some embodiments, generated by a messaging clientapplication 104 for communication to a further messaging clientapplication 104 or the messaging server application 114. The content ofa particular message 400 is used to populate the message table 314stored within the database(s) 120, accessible by the messaging serverapplication 114. Similarly, the content of a message 400 is stored inmemory as “in-transit” or “in-flight” data of the client device 102 orthe application server 112. The message 400 is shown to include thefollowing components:

-   -   A message identifier 402: a unique identifier that identifies        the message 400.    -   A message text payload 404: text, to be generated by a user via        a user interface of the client device 102 and that is included        in the message 400.    -   A message image payload 406: image data, captured by a camera        component of a client device 102 or retrieved from memory of a        client device 102, and that is included in the message 400.    -   A message video payload 408: video data, captured by a camera        component or retrieved from a memory component of the client        device 102 and that is included in the message 400.    -   A message audio payload 410: audio data, captured by a        microphone or retrieved from the memory component of the client        device 102, and that is included in the message 400.    -   A message annotations 412: annotation data (e.g., filters,        stickers or other enhancements) that represent annotations to be        applied to message image payload 406, message video payload 408,        or message audio payload 410 of the message 400.    -   A message duration parameter 414: parameter value indicating, in        seconds, the amount of time for which content of the message 400        (e.g., the message image payload 406, message video payload 408,        message audio payload 410) is to be presented or made accessible        to a user via the messaging client application 104.    -   A message geolocation parameter 416: geolocation data (e.g.,        latitudinal and longitudinal coordinates) associated with the        content payload of the message 400. Multiple message geolocation        parameter 416 values may be included in the payload, each of        these parameter values being associated with respect to content        items included in the content (e.g., a specific image within the        message image payload 406, or a specific video in the message        video payload 408).    -   A message story identifier 418: values identifying one or more        content collections (e.g., “stories”) with which a particular        content item in the message image payload 406 of the message 400        is associated. For example, multiple images within the message        image payload 406 may each be associated with multiple content        collections using identifier values.    -   A message tag 420: each message 400 may be tagged with multiple        tags, each of which is indicative of the subject matter of        content included in the message payload. For example, where a        particular image included in the message image payload 406        depicts an animal (e.g., a lion), a tag value may be included        within the message tag 420 that is indicative of the relevant        animal. Tag values may be generated manually, based on user        input, or may be automatically generated using, for example,        image recognition.    -   A message sender identifier 422: an identifier (e.g., a        messaging system identifier, email address or device identifier)        indicative of a user of the client device 102 on which the        message 400 was generated and from which the message 400 was        sent.    -   A message receiver identifier 424: an identifier (e.g., a        messaging system identifier, email address or device identifier)        indicative of a user of the client device 102 to which the        message 400 is addressed.

The contents (e.g., values) of the various components of message 400 maybe pointers to locations in tables within which content data values arestored. For example, an image value in the message image payload 406 maybe a pointer to (or address of) a location within an image table 308.Similarly, values within the message video payload 408 may point to datastored within a video table 310, values stored within the messageannotations 412 may point to data stored in an annotation table 312,values stored within the message story identifier 418 may point to datastored in a story table 306, and values stored within the message senderidentifier 422 and the message receiver identifier 424 may point to userrecords stored within an entity table 302.

FIG. 5 illustrates aspects of a client 504 for local image tagging andvisual search according to some example embodiments. The example client504 includes input and output (I/O) module 552, content characteristicanalysis module 554, object recognition module 556, and content database558.

I/O module 552 may include any hardware, firmware, or software elementsneeded to send and receive content. In some embodiments this includesimages sensors. In some embodiments this includes network interfaces forreceiving images from other devices via a messaging network. Contentcharacteristic analysis module 554 may include devices, processors, andsoftware to analyze images from pictures and frames of video clips, andthen determine content characteristics, including details about when andwhere a picture or video was generated. In certain embodiments, contentcharacteristic analysis module 554 may be implemented as a plurality ofdifferent modules, each analyzing a different content characteristic,including any content characteristic described herein.

Object recognition module 556 describes a particular module that may beused to identify content characteristics based on the content of animage or images in a video. Object recognition module 556 includeshardware, firmware, and/or software for analyzing and understandingcontent. In one embodiment, object recognition module 556 is associatedwith a dictionary comprising image and video content values. Objectsidentified in images of a piece of content and the arrangement of theidentified objects therein may be used by object recognition module 556,in such an embodiment, to select one or more content values from thedictionary as content characteristics. For example, a simple objectrecognition module 556 may identify a ball in an image, and select thevalues “ball” and “game” as content characteristics. A more complexmodule may identify the type of ball as a basketball, and include“basketball” as a characteristic value. A still more complex objectrecognition module 556 may identify a basketball, a crowd, a courtcolor, and an elevated perspective of the court to identify“professional basketball game” and “basketball arena” as content valuesfor the content. The same complex object recognition module 556 mayidentify a basketball, a park background, and a concrete court surfaceand associate “amateur basketball game” and “playground basketball” ascontent values for the content that is illustrated as an example in FIG.6 . Such content values may operate as context values which are used togenerate content collections as described herein. Other types of contextvalues besides such content values, however, may be used to generatecontent collections without using content values, or in addition to suchcontent values. For example, one embodiment of an image may haveassociated context data comprising location data (e.g. coordinates or ageofence), time data (e.g. a time of day, a day of the month, an hour,etc.) content values (e.g. trees, basketball court, a face, etc.)quality values (e.g. blur, exposure, brightness, contrast, etc.) or anyother such values which are referred to herein as context data.

These content values generated by object recognition module 556 can thenbe stored in content database 558 along with other characteristicvalues. Such characteristic values can include: one or more contentvalues (i.e., an identification of what's in the content); a generationtime; a generation time period; a generation location; a generationarea; one or more quality values; any metadata value associated withcontent: an identifier for a particular piece of content; or any othersuch values. In some embodiments, a copy of content may be stored incontent database 558 with location information, capture timeinformation, and any other such information about a piece of content. Incertain embodiments, content database 558 may anonymously store detailsabout content use. For example, client devices 102 can communicatedetails about presentation of the content on a screen of the device, andabout screenshots taken of the content. Anonymous metrics about howoften a piece of content is viewed as part of a content collection, howlong the content is viewed for, and how frequently screenshots are takenmay then be measured by client 504, as part of analysis by contentcharacteristic analysis module 554, with the resulting data stored incontent database 558. In some embodiments, content database 558 mayinclude this content information with any content or content messageinformation discussed above with respect to FIG. 4 or in any database ortable structure discussed above.

FIG. 6 shows aspects of a user interface for a message device 600 thatmay be used as part of a system as described herein. Message device 600,for example, may operate any elements of client 504 or client devices102. FIG. 6 shows message device 600 with display area 606, which is atouch screen operating as both an output display and an input device.Device 600 may be used to capture content, which is then processed andanalyzed according to embodiments described herein. The contentillustrated in display area 606, for example, may be processed by theobject recognition module 556 to identify a basketball, a parkbackground, and a concrete court surface and associate “amateurbasketball game” and “playground basketball” as context values for thecontent. Depending on other context values, such as location data, thecontext may be identified as “school” or “park” or “university”.

In addition to various user interface elements, display area displaysimage 690 (e.g., the image 690 for content generated by the device 600),which includes both image data from a camera of device 600 as well asimage capture user interface elements. Interface 607, for example,provides input options to send messages. Interface element 609 may beused to initiate capture of content (e.g., images or video clips) usingthe camera. Such content may then be analyzed locally as part of localorganization or search within a gallery of content stored on the device600 in accordance with the embodiments described herein.

FIG. 7 illustrates aspects of a system for image tagging and visualsearch according to some embodiments. As described above, image data 702for an image is accessed by a mobile device such as client device 102 ormachine 1300. The mobile device performs processing operations togenerate extended visual search tags which are stored in a searchdatabase 732 or any such memory structure of the mobile device. Theseextended visual search tags are then used to generate a set of searchresults 734 when a search input is received at the mobile device. Theextended visual search tags may be generated by a combination of imagetagging and metadata analysis, or in various combinations of analysiswith the image tagging operations described herein.

In the embodiment of FIG. 7 , deep convolutional neural network (DCNN)704 processes image data 702 to generate visual tags 708. In someembodiments, DCNN comprises a neural network structure along with a setof predetermined weights. The weights are determined by prior trainingof the DCNN using a set number of image recognition tags or items. Inother words, the DCNN is trained to recognize a limited number of items.A DCNN trained to recognize a beach and a tree will generate outputvalues associated with “beach” and “tree”. The output values arecompared with a threshold to determine if a tag for the items isassociated with the image in the search database 732. Items in the imagebut not part of the DCNN training will simply be ignored. The number ofitems directly scored as part of the object recognition operationperformed by DCNN 704 is thus limited, and can vary from tens of itemsto hundreds of items in some embodiments. As processing resources inmobile devices become more effective, thousands of items may be directlyanalyzed by a DCNN 702 operating in a resource constrained mobiledevice. In such systems, the use of a DCNN described in FIG. 8 providesmore efficient resource usage to allow additional items and fasterprocessing in a mobile environment. The use of a knowledge graph 706 mayadditionally be used to generate more complicated tags from a limitedset of items that are directly trained into and analyzed by a DCNN.

A set of visual tags 708 are then assigned to the image data 702 basedon output values from the DCNN 704. Visual tags 708 include outputvalues for particular items that are part of the DCNN prior training,including tree with a value of 0.518, grass with a value of 0.434, tablewith a value of 0.309, and basketball court with a value of 0.309. Thesevalues are presented for illustrative purposes, and it is to beunderstood that different values and items may be used in differentembodiments. Visual tags 708 includes items for which the output scoresexceed a threshold (e.g. 0.3). Other items, such as beach, cat, dog,car, house, or shoes, may be items that are part of DCNN training, butfor which the output values are below the threshold. For those items, noassociated tag is generated for image data 702. Additional details ofDCNN operation are described below with respect to FIGS. 8 and 9 .

In addition to the use of DCNN operations to generate visual tags toassist with visual search, natural language processing or otherprocessing of metadata can be used in conjunction with visual tags 708to provide more comprehensive search results. Metadata 722 includes datasuch as location data from a positioning system that captures a devicelocation when image data 702 is generated, as well as time data for acapture time associated with image data 702. Natural language processing724 can be used to associate both visual tags 708 and metadata 722 withmuch more complex natural language matches 726 for inclusion in searchdatabase 732 and to search results 734 for natural language searchinputs.

In various embodiments such as the embodiment of FIG. 7 , becausetagging, indexing, and ranking are all performed on the device, thesearch experience is extremely fast. For example, the searchautocomplete function returns almost instantly, leading to a much betteruser experience than similar server-side autocomplete methods. Inaddition to traditional keyword/tag matching, semantic matching forimage search tags based on natural language processing techniquessignificantly increase the coverage and quality of the image searchexperience. In one embodiment, a semantic augmentation technique worksby analyzing each word in the search query and matching it semanticallyto the best possible visual tags generated by the visual recognitionalgorithm.

One particular embodiment of sematic augmentation comprises a use ofensembles of textual embeddings from a diverse set of very large textualdatasets, such as Wikipedia™ and other such text sources. Textualembeddings are projections of word into a low dimensional space, fromwhich similarity metrics can be derived. Using low dimensionalsimilarity metrics, the textual embeddings are systematicallyaggregated, producing candidate lists of synonyms as well as similarwords semantically related to the output tags obtained from the visualrecognition algorithm. The candidate lists are filtered using heuristicsand visual inspection. The output of this process is a database (e.gsearch database 732 in some embodiments) of visual tags along withassociated list of synonyms and related words. This abovementionedtechnique for semantic augmentation of visual tags using naturallanguage processing significantly broadens the coverage of the searchresults and consequently improves the overall visual search experience.

FIG. 8 illustrates aspects of DCNN in accordance with variousembodiments described herein. The deep neural network structure consistsof variant number of basic layers. The partial DCNN structure 800 ofFIG. 8 includes layers 801 through 808. There are multiple types oflayers, such as a convolution layer, a pooling layer, and afully-connected layer. From high-level abstraction, the convolutionlayer(s) and pooling layer(s) serve as feature descriptors (e.g. similarto descriptors for a person like a fingerprint or face description)extractor; and the fully-connected layer(s) serve as the classifier.Each layer has associated parameters that are set during a trainingphase that is used to generate the particular DCNN that is communicatedto and used by a mobile device after the parameters have been determinedby training. Such neural network models may have tens of millions ofparameters, and are extremely computationally expensive. Training,therefore, is conducted remotely via high resource computing systems,and the parameters and any necessary details of the DCNN are thentransferred to mobile devices following training.

The convolutional layers are the core of the DCNN models used by variousembodiments. The convolutional layer parameters include a set oflearnable filters in a matrix form with a height and width less than theimage being processed with the DCNN. During a processing operating whereimage data is analyzed with a DCNN, for a convolutional layer, eachfilter (also referred to as a kernel) is convolved across the width andheight of the input image, computing the dot product between the entriesof the filter and the input and producing a 2-dimensional activation mapof that filter. Sufficient activity associated with a particular kernelindicates that the DCNN was previously trained to identify and tagimages using an item or image type (e.g. car, cat, tree) that set theparameters of the particular activated kernel. As illustrated in FIG. 9, multiple convolutional layers, each with associated kernels, may bepart of a single DCNN to generate output values for image data.

Convolutional layers of layers 801-808 may generate convolution outputs901, 902, 910, and etcetera. Such convolution outputs are computed bythe following operation of two matrixes (e.g. image data matrix A andkernel matrix B) having sizes M_(A)×N_(A) and M_(B)×N_(B). The resultingconvolutional output matrix C is computed as:C(x,y)=Σ_(m∈[0,M) _(a) _(−1])Σ_(n∈[0,N) _(a) _(−1])A(m,n)B(i−m,j−n)  (1)where x∈[0,M_(a)+M_(b)−1) and y∈[0,N_(a)+N_(b)−1)

Layers 801-808 may additional include pooling or subsampling layers. Apooling layer is used to reduce the size of output from convolutionallayer. Various embodiments use max-pooling (e.g. layers that select thelargest value from a subset of the matrix and uses that single value toreplace multiple other matrix elements) layers and average-poolinglayers (e.g. layers that replace a subset of a matrix with an averagevalue of the replaced matrix elements). For example, in a matrix with4×4 elements, by applying the max pooling operation on each 2×2 block(generating 1 output from each 2×2 block), the output would be a 2×2matrix. In a max-pooling layer, each 2×2 block is replaced with a singleelement having the value of the highest value element from the previous2×2 blocks. The four 2×2 blocks of the original 4×4 matrix are thusreplaced with single elements to generate a new 2×2 matrix.

A fully connected layer represents that each node at a layer connects toall the nodes from previous layers. Such a layer may be definedmathematically by the inner product of the previous layer's output andthe layer parameters.

The embodiment of FIG. 8 illustrates that different layers may connectwith other layers in a variety of ways. The layers of a DCNN may thus bestructured as subgraphs, or collections of layers where some layers donot pass data to another subgraph directly. Partial DCNN structure 800includes two subgraphs, shown as subgraph 810 and subgraph 820. Layer801 is illustrated as passing intermediate data to both layer 2 andlayer 3 of subgraph 810, but does not pass data to subgraph 820 (or anyother subgraph).

Traditional convolutional neural network based approaches perform aforward inference sequentially and save all the intermediate outputs.Because an image recognition application typically contains tens oflayers in a DCNN, previous DCNN schemas creates tens or hundreds ofintermediate output layers that consume memory. To alleviate theintensive memory consumption issue, intermediate layers that do not passdata to another subgraph have their associated intermediate data deleted(e.g. actively deleted or alternatively having the previously usedmemory made available for other purposes without actively erasing oroverwriting bits until needed). Some such embodiments operate byanalyzing the dependency of layers (e.g. layers 801-808) in a layergraph (e.g. the layer graph of FIG. 8 .) For example, such analysis maydenote {O_(i)}^(i=1, . . . N) ^(t) as layer outputs from layers{L_(i)}^(i=1, . . . N) ^(t) ; and denote {L_(i)}^(i=N) ^(t+1)^(, . . . N) as all the left layers that we haven't performed forwardpassing. The analysis identifies all the edges which connect subgraph{L_(i)}^(i=1, . . . N) ^(t) and subgraph {L_(i)}^(i=N) ^(t+1)^(, . . . N). All the layer outputs which are not related to theconnecting edges will be immediately deleted. Because the deleted layeroutputs does not contribute to any following inference, it does notaffect the recognition result (e.g. tags associated with an image), andin some embodiments saves more than half of the memory consumption overprevious DCNN operations.

FIG. 9 then illustrates aspects of visual tagging and visual search inaccordance with some embodiments. In addition to the above describedsubgraph dependency optimization, some embodiments additionally includecross-region feature sharing operations. As described above, in aconvolutional layer, a filter/kernel is convolved across the width andheight of the input image. In standard operation of prior DCNNs,multiple windows has been proven to consistently improve objectrecognition performance. This previous operation is implemented bycropping an input image into multiple sub-windows from an image, andthen aggregating the sub-window recognition results. In such knownsystems, each sub-window is classified independently. There are twoobvious drawbacks of this approach. (1) It largely ignores the fact thatconvolutional outputs in different sub-windows are actually partiallyshared; and (2) It is be very costly to crop sub-windows because therunning time is linear with respect to the number of crops performed. Inembodiments described above, the whole image will be applied theconvolutional kernels layer by layer and will pull out any sub-windowsafter a final convolutional layer. In this way, the convolutionalkernels are only applied once to the overlapping areas of thesub-windows, and this significantly saves the computation resources in aresource limited mobile environment.

To improve performed in resource constrained mobile devices, someembodiments use a neural network architecture which contains onlyconvolutional layers except the last prediction layer. During trainingtime (e.g. generation of values on a system with significant resources),the last layer serves as a fully-connected layer to produceclassification scores. Following training of the DCNN, this fullyconnected layer is converted to as a convolutional layer. In this way,each convolutional kernel produce a prediction score map for an imagecategory. In such embodiments, the generated framework is capable ofobtaining dense sub-window recognition scores by applying convolutionalkernels layer by layer to the whole image (instead of croppedsub-windows).

The values generated by training are then used to implement a DCNN on amobile device. As illustrated by FIG. 9 , the trained DCNN 900 includesconvolutional layers 901-910, where the last convolutional layer 910 isthe convolutional layer converted from the fully connected layer in thetraining. Using this convolutional layer 910 converted from the fullyconnected layer, score maps 911-920 are generated.

Further still, in addition to various use of the sublayer dependencymemory optimization and the cross-region feature sharing describedabove, various embodiments use estimated or compressed weight values.Typical DCNN implementations use floating point weights trained, stored,and used to analyze images with 32 bit floating numbers. In order toreduce the memory usage when applying the trained model in the resourceconstrained environment of mobile devices described herein, in someembodiments, 16 bit half precision values are used to store the floatingpoint weights, thus saving 50% of memory usage. In another embodiment,32 bit floating point weights are compressed to 8 bit indices, withoriginal weights quantized into 256 bins. This quantization can beadaptively performed for each layer with clustering methods, such ask-means clustering. Using such compression, the memory size is reducedby about 75%, and the 8 bit weights of each layer are only decompressedwhen they are needed.

FIGS. 9 and 10 illustrate aspects of operations for image processing andvisual search in accordance with embodiments described herein. FIG. 9describes one example method for object recognition, with method 1000using a DCNN to generate tags for an image. Method 1000 is performed bya resource limited mobile device, such as client device 102 or a mobilemachine 1300. In other embodiments, method 1000 is performed by a devicecomprising instructions that, when executed by one or more processors ofa mobile device, cause the mobile device to perform the method 1000.

Method 1000 begins with operation 1002 capturing, by an image sensor ofthe mobile device, a first image. The image sensor may, for example, bean image sensor of I/O components 1318 in some embodiments. In operation1004, the mobile device processes the first image as captured by theimage sensor to generate a file comprising the image data at a firstpixel resolution associated with a pixel height and a pixel width. Imagedata for the first image will therefore have a format or resolutioncorresponding to the pixel height and pixel width, which is later usedin the correlation with DCNN convolution kernels. Operation 1006involves accessing, by one or more processors of the mobile device,image data for a first image as stored on the mobile device. The DCNNone the mobile device then begins processing of the image data inoperation 1008 using a deep convolutional neural network (DCNN) executedby the one or more processors. The structure of the DCNN includes atleast a first subgraph and a second subgraph, the first subgraphcomprising at least a first layer and a second layer. Such structure mayinclude a wide variety of connections between different layers, and mayinclude convolutional layers including convolution and subsamplingoperations.

As part of the DCNN analysis, method 1010 includes processing, by themobile device, the image data using at least the first layer of thefirst subgraph to generate first intermediate output data. Operation1012 then proceeds with processing, by the mobile device, the firstintermediate output data using at least the second layer of the firstsubgraph to generate first subgraph output data, and in response to adetermination in operation 104 that each layer reliant on the firstintermediate data have completed processing, the first intermediate datais deleted from the mobile device. As described above, such a deletionmay simply involve making the memory space that stored the intermediatedata and passed it along for use by a subsequent layer available tostore other information. The data may remain in the memory after thedeletion until it is specifically overwritten as the memory space isused for other data.

Following this, the DCNN operation proceeds through all the layers,including any repetitions of similar operations or other operations,until output values are generated, and the output values are used toassign one or more tags to the first image based on the output valuesfrom the DCNN in operation 1030.

It will be apparent that the operations of method 1000 may be used inalternative embodiments, with various operations repeated or presentedin different orders, and with any number of intermediate operations, aslong as the particular combination is operable on a resource limitedmobile device.

FIG. 11 , for example, describes method 1100 where additional operationsthat occur in some embodiments following operation 1008. In method 1100,operation 1104 involves processing image data for an image using a firstlayer of a first subgraph to convolve a first kernel with the imagedata. The kernel is a matrix with a height and width smaller than thecorresponding pixel height and width of the image. Thus, as describedabove, the kernel is convolved with the entire image.

In operation 1106, the DCNN generates a plurality of output values,where each output value is associated with a corresponding tagrepresenting a possible item in an image. The output value varies basedon the strength of the match estimated by the DCNN as previouslytrained. Each output value of the plurality of output values is comparedwith a corresponding threshold in operation 1108, and in operation 1110,one or more tags are assigned to the first image based on thecomparison. The strength of the output value thus represents theconfidence that the image includes the item associated with theparticular output value.

In corresponding operations for semantic analysis or natural languageprocessing, metadata associated with the image is captured in operation1112. Such data may be any data captured by sensors, such as the sensorsof I/O components 1318 for machine 1300. Such data may also be userinput data rather than sensor data. Examples of such additional datainclude location data or time data for the position and time when theimage was captured by the mobile device. In operation 1114, the mobiledevice processes the tags and the metadata to generate a set of extendedsearch tags. In operation 1116, a set of search results are generated bycomparing a user input search term with extended visual searchinformation or tags. The search results may simply be a set of presentedimages, or may include images along with natural language resultindicators identifying why the images are classified as results of theinput search. In some embodiments, extended visual search tags (e.g.search information) is stored in a database as part of processing anewly captured image. In other embodiments, an image may be stored withDCNN produced tags and metadata, and extended visual search tags mayonly be generated when a user inputs a search.

In addition to the above specifically described embodiments, it will beapparent that other embodiments using the operations and DCNN structuresdescribed herein are possible.

Software Architecture

FIG. 12 is a block diagram illustrating an example software architecture1206, which may be used in conjunction with various hardwarearchitectures herein described. FIG. 12 is a non-limiting example of asoftware architecture 1206 and it will be appreciated that many otherarchitectures may be implemented to facilitate the functionalitydescribed herein. The software architecture 1206 may execute on hardwaresuch as machine 1300 of FIG. 13 that includes, among other things,processors 1304, memory 1314, and I/O components 1318. A representativehardware layer 1252 is illustrated and can represent, for example, themachine 1300 of FIG. 13 . The representative hardware layer 1252includes a processing unit 1254 having associated executableinstructions 1204. Executable instructions 1204 represent the executableinstructions of the software architecture 1206, including implementationof the methods, components and so forth described herein. The hardwarelayer 1252 also includes memory and/or storage modules memory/storage,which also have executable instructions 1204. The hardware layer 1252may also comprise other hardware 1258.

In the example architecture of FIG. 12 , the software architecture 1206may be conceptualized as a stack of layers where each layer providesparticular functionality. For example, the software architecture 1206may include layers such as an operating system 1202, libraries 1220,applications 1216 and a presentation layer 1214. Operationally, theapplications 1216 and/or other components within the layers may invokeapplication programming interface (API) API calls 1208 through thesoftware stack and receive messages 1212 in response to the API calls1208. The layers illustrated are representative in nature and not allsoftware architectures have all layers. For example, some mobile orspecial purpose operating systems may not provide aframeworks/middleware 1218, while others may provide such a layer. Othersoftware architectures may include additional or different layers.

The operating system 1202 may manage hardware resources and providecommon services. The operating system 1202 may include, for example, akernel 1222, services 1224 and drivers 1226. The kernel 1222 may act asan abstraction layer between the hardware and the other software layers.For example, the kernel 1222 may be responsible for memory management,processor management (e.g., scheduling), component management,networking, security settings, and so on. The services 1224 may provideother common services for the other software layers. The drivers 1226are responsible for controlling or interfacing with the underlyinghardware. For instance, the drivers 1226 include display drivers, cameradrivers, Bluetooth® drivers, flash memory drivers, serial communicationdrivers (e.g., Universal Serial Bus (USB) drivers), Wi-Fi® drivers,audio drivers, power management drivers, and so forth depending on thehardware configuration.

The libraries 1220 provide a common infrastructure that is used by theapplications 1216 and/or other components and/or layers. The libraries1220 provide functionality that allows other software components toperform tasks in an easier fashion than to interface directly with theunderlying operating system 1202 functionality (e.g., kernel 1222,services 1224 and/or drivers 1226). The libraries 1220 may includesystem libraries 1244 (e.g., C standard library) that may providefunctions such as memory allocation functions, string manipulationfunctions, mathematical functions, and the like. In addition, thelibraries 1220 may include API libraries 1246 such as media libraries(e.g., libraries to support presentation and manipulation of variousmedia format such as MPREG4, H.264, MP3, AAC, AMR, JPG, PNG), graphicslibraries (e.g., an OpenGL framework that may be used to render 2D and3D in a graphic content on a display), database libraries (e.g., SQLitethat may provide various relational database functions), web libraries(e.g., WebKit that may provide web browsing functionality), and thelike. The libraries 1220 may also include a wide variety of otherlibraries 1248 to provide many other APIs to the applications 1216 andother software components/modules.

The frameworks/middleware 1218 (also sometimes referred to asmiddleware) provide a higher-level common infrastructure that may beused by the applications 1216 and/or other software components/modules.For example, the frameworks/middleware 1218 may provide various graphicuser interface (GUI) functions, high-level resource management,high-level location services, and so forth. The frameworks/middleware1218 may provide a broad spectrum of other APIs that may be utilized bythe applications 1216 and/or other software components/modules, some ofwhich may be specific to a particular operating system 1202 or platform.

The applications 1216 include built-in applications 1238 and/orthird-party applications 1240. Examples of representative built-inapplications 1238 may include, but are not limited to, a contactsapplication, a browser application, a book reader application, alocation application, a media application, a messaging application,and/or a game application. Third-party applications 1240 may include anapplication developed using the ANDROID™ or IOS™ software developmentkit (SDK) by an entity other than the vendor of the particular platform,and may be mobile software running on a mobile operating system such asIOS™, ANDROID™, WINDOWS® Phone, or other mobile operating systems. Thethird-party applications 1240 may invoke the API calls 1208 provided bythe mobile operating system (such as operating system 1202) tofacilitate functionality described herein.

The applications 1216 may use built in operating system functions (e.g.,kernel 1222, services 1224 and/or drivers 1226), libraries 1220, andframeworks/middleware 1218 to create user interfaces to interact withusers of the system. Alternatively, or additionally, in some systemsinteractions with a user may occur through a presentation layer, such aspresentation layer 1214. In these systems, the application/component“logic” can be separated from the aspects of the application/componentthat interact with a user.

FIG. 13 is a block diagram illustrating components of a machine 1300,according to some example embodiments, able to read instructions from amachine-readable medium (e.g., a machine-readable storage medium) andperform any one or more of the methodologies discussed herein.Specifically, FIG. 13 shows a diagrammatic representation of the machine1300 in the example form of a computer system, within which instructions1310 (e.g., software, a program, an application, an applet, an app, orother executable code) for causing the machine 1300 to perform any oneor more of the methodologies discussed herein may be executed. As such,the instructions 1310 may be used to implement modules or componentsdescribed herein. The instructions 1310 transform the general,non-programmed machine 1300 into a particular machine 1300 programmed tocarry out the described and illustrated functions in the mannerdescribed. In alternative embodiments, the machine 1300 operates as astandalone device or may be coupled (e.g., networked) to other machines.In a networked deployment, the machine 1300 may operate in the capacityof a server machine or a client machine in a server-client networkenvironment, or as a peer machine in a peer-to-peer (or distributed)network environment. The machine 1300 may comprise, but not be limitedto, a server computer, a client computer, a personal computer (PC), atablet computer, a laptop computer, a netbook, a set-top box (STB), apersonal digital assistant (PDA), an entertainment media system, acellular telephone, a smart phone, a mobile device, a wearable device(e.g., a smart watch), a smart home device (e.g., a smart appliance),other smart devices, a web appliance, a network router, a networkswitch, a network bridge, or any machine capable of executing theinstructions 1310, sequentially or otherwise, that specify actions to betaken by machine 1300. Further, while only a single machine 1300 isillustrated, the term “machine” shall also be taken to include acollection of machines that individually or jointly execute theinstructions 1310 to perform any one or more of the methodologiesdiscussed herein.

The machine 1300 may include processors 1304, memory memory/storage1306, and I/O components 1318, which may be configured to communicatewith each other such as via a bus 1302. The memory/storage 1306 mayinclude a memory 1314, such as a main memory, or other memory storage,and a storage unit 1316, both accessible to the processors 1304 such asvia the bus 1302. The storage unit 1316 and memory 1314 store theinstructions 1310 embodying any one or more of the methodologies orfunctions described herein. The instructions 1310 may also reside,completely or partially, within the memory 1314, within the storage unit1316, within at least one of the processors 1304 (e.g., within theprocessor's cache memory), or any suitable combination thereof, duringexecution thereof by the machine 1300. Accordingly, the memory 1314, thestorage unit 1316, and the memory of processors 1304 are examples ofmachine-readable media.

The I/O components 1318 may include a wide variety of components toreceive input, provide output, produce output, transmit information,exchange information, capture measurements, and so on. The specific I/Ocomponents 1318 that are included in a particular machine 1300 willdepend on the type of machine. For example, portable machines such asmobile phones will likely include a touch input device or other suchinput mechanisms, while a headless server machine will likely notinclude such a touch input device. It will be appreciated that the I/Ocomponents 1318 may include many other components that are not shown inFIG. 13 . The I/O components 1318 are grouped according to functionalitymerely for simplifying the following discussion and the grouping is inno way limiting. In various example embodiments, the I/O components 1318may include output components 1326 and input components 1328. The outputcomponents 1326 may include visual components (e.g., a display such as aplasma display panel (PDP), a light emitting diode (LED) display, aliquid crystal display (LCD), a projector, or a cathode ray tube (CRT)),acoustic components (e.g., speakers), haptic components (e.g., avibratory motor, resistance mechanisms), other signal generators, and soforth. The input components 1328 may include alphanumeric inputcomponents (e.g., a keyboard, a touch screen configured to receivealphanumeric input, a photo-optical keyboard, or other alphanumericinput components), point based input components (e.g., a mouse, atouchpad, a trackball, a joystick, a motion sensor, or other pointinginstrument), tactile input components (e.g., a physical button, a touchscreen that provides location and/or force of touches or touch gestures,or other tactile input components), audio input components (e.g., amicrophone), and the like.

In further example embodiments, the I/O components 1318 may includebiometric components 1330, motion components 1334, environmentcomponents 1336, or position components 1338 among a wide array of othercomponents. For example, the biometric components 1330 may includecomponents to detect expressions (e.g., hand expressions, facialexpressions, vocal expressions, body gestures, or eye tracking), measurebiosignals (e.g., blood pressure, heart rate, body temperature,perspiration, or brain waves), identify a person (e.g., voiceidentification, retinal identification, facial identification,fingerprint identification, or electroencephalogram basedidentification), and the like. The motion components 1334 may includeacceleration sensor components (e.g., accelerometer), gravitation sensorcomponents, rotation sensor components (e.g., gyroscope), and so forth.The environment components 1336 may include, for example, illuminationsensor components (e.g., photometer), temperature sensor components(e.g., one or more thermometer that detect ambient temperature),humidity sensor components, pressure sensor components (e.g.,barometer), acoustic sensor components (e.g., one or more microphonesthat detect background noise), proximity sensor components (e.g.,infrared sensors that detect nearby objects), gas sensors (e.g., gasdetection sensors to detection concentrations of hazardous gases forsafety or to measure pollutants in the atmosphere), or other componentsthat may provide indications, measurements, or signals corresponding toa surrounding physical environment. The position components 1338 mayinclude location sensor components (e.g., a Global Position system (GPS)receiver component), altitude sensor components (e.g., altimeters orbarometers that detect air pressure from which altitude may be derived),orientation sensor components (e.g., magnetometers), and the like.

Communication may be implemented using a wide variety of technologies.The I/O components 1318 may include communication components 1340operable to couple the machine 1300 to a network 1332 or devices 1320via coupling 1322 and coupling 1324 respectively. For example, thecommunication components 1340 may include a network interface componentor other suitable device to interface with the network 1332. In furtherexamples, communication components 1340 may include wired communicationcomponents, wireless communication components, cellular communicationcomponents, Near Field Communication (NFC) components, Bluetooth®components (e.g., Bluetooth® Low Energy), Wi-Fi® components, and othercommunication components to provide communication via other modalities.The devices 1320 may be another machine or any of a wide variety ofperipheral devices (e.g., a peripheral device coupled via a UniversalSerial Bus (USB)).

Moreover, the communication components 1340 may detect identifiers orinclude components operable to detect identifiers. For example, thecommunication components 1340 may include Radio Frequency Identification(RFID) tag reader components, NFC smart tag detection components,optical reader components (e.g., an optical sensor to detectone-dimensional bar codes such as Universal Product Code (UPC) bar code,multi-dimensional bar codes such as Quick Response (QR) code, Azteccode, Data Matrix, Dataglyph, MaxiCode, PDF417, Ultra Code, UCC RSS-2Dbar code, and other optical codes), or acoustic detection components(e.g., microphones to identify tagged audio signals). In addition, avariety of information may be derived via the communication components1340, such as location via Internet Protocol (IP) geo-location, locationvia Wi-Fi® signal triangulation, location via detecting a NFC beaconsignal that may indicate a particular location, and so forth.

“CARRIER SIGNAL” in this context refers to any intangible medium that iscapable of storing, encoding, or carrying instructions for execution bythe machine, and includes digital or analog communications signals orother intangible medium to facilitate communication of suchinstructions. Instructions may be transmitted or received over thenetwork using a transmission medium via a network interface device andusing any one of a number of well-known transfer protocols.

“CLIENT DEVICE” in this context refers to any machine that interfaces toa communications network to obtain resources from one or more serversystems or other client devices. A client device may be, but is notlimited to, a mobile phone, desktop computer, laptop, portable digitalassistants (PDAs), smart phones, tablets, ultra books, netbooks,laptops, multi-processor systems, microprocessor-based or programmableconsumer electronics, game consoles, set-top boxes, or any othercommunication device that a user may use to access a network.

“COMMUNICATIONS NETWORK” in this context refers to one or more portionsof a network that may be an ad hoc network, an intranet, an extranet, avirtual private network (VPN), a local area network (LAN), a wirelessLAN (WLAN), a wide area network (WAN), a wireless WAN (WWAN), ametropolitan area network (MAN), the Internet, a portion of theInternet, a portion of the Public Switched Telephone Network (PSTN), aplain old telephone service (POTS) network, a cellular telephonenetwork, a wireless network, a Wi-Fi® network, another type of network,or a combination of two or more such networks. For example, a network ora portion of a network may include a wireless or cellular network andthe coupling may be a Code Division Multiple Access (CDMA) connection, aGlobal System for Mobile communications (GSM) connection, or other typeof cellular or wireless coupling. In this example, the coupling mayimplement any of a variety of types of data transfer technology, such asSingle Carrier Radio Transmission Technology (1×RTT), Evolution-DataOptimized (EVDO) technology, General Packet Radio Service (GPRS)technology, Enhanced Data rates for GSM Evolution (EDGE) technology,third Generation Partnership Project (3GPP) including 3G, fourthgeneration wireless (4G) networks, Universal Mobile TelecommunicationsSystem (UMTS), High Speed Packet Access (HSPA), WorldwideInteroperability for Microwave Access (WiMAX), Long Term Evolution (LTE)standard, others defined by various standard setting organizations,other long range protocols, or other data transfer technology.

“EMPHEMERAL MESSAGE” in this context refers to a message that isaccessible for a time-limited duration. An ephemeral message may be atext, an image, a video and the like. The access time for the ephemeralmessage may be set by the message sender. Alternatively, the access timemay be a default setting or a setting specified by the recipient.Regardless of the setting technique, the message is transitory, even ifthe message is temporarily stored in a non-transitory computer readablemedium.

“MACHINE-READABLE MEDIUM” or “NON-TRANSITORY COMPUTER READABLE MEDIUM”in this context refers to a component, device or other tangible mediaable to store instructions and data temporarily or permanently and mayinclude, but is not be limited to, random-access memory (RAM), read-onlymemory (ROM), buffer memory, flash memory, optical media, magneticmedia, cache memory, other types of storage (e.g., Erasable ProgrammableRead-Only Memory (EEPROM)) and/or any suitable combination thereof. Theterm “machine-readable medium” should be taken to include a singlemedium or multiple media (e.g., a centralized or distributed database,or associated caches and servers) able to store instructions. The term“machine-readable medium” shall also be taken to include any medium, orcombination of multiple media, that is capable of storing instructions(e.g., code) for execution by a machine, such that the instructions,when executed by one or more processors of the machine, cause themachine to perform any one or more of the methodologies describedherein. Accordingly, a “machine-readable medium” refers to a singlestorage apparatus or device, as well as “cloud-based” storage systems orstorage networks that include multiple storage apparatus or devices. Theterm “machine-readable medium” excludes signals per se.

“COMPONENT” in this context refers to a device, physical entity or logichaving boundaries defined by function or subroutine calls, branchpoints, application program interfaces (APIs), or other technologiesthat provide for the partitioning or modularization of particularprocessing or control functions. Components may be combined via theirinterfaces with other components to carry out a machine process. Acomponent may be a packaged functional hardware unit designed for usewith other components and a part of a program that usually performs aparticular function of related functions. Components may constituteeither software components (e.g., code embodied on a machine-readablemedium) or hardware components. A “hardware component” is a tangibleunit capable of performing certain operations and may be configured orarranged in a certain physical manner. In various example embodiments,one or more computer systems (e.g., a standalone computer system, aclient computer system, or a server computer system) or one or morehardware components of a computer system (e.g., a processor or a groupof processors) may be configured by software (e.g., an application orapplication portion) as a hardware component that operates to performcertain operations as described herein. A hardware component may also beimplemented mechanically, electronically, or any suitable combinationthereof. For example, a hardware component may include dedicatedcircuitry or logic that is permanently configured to perform certainoperations. A hardware component may be a special-purpose processor,such as a Field-Programmable Gate Array (FPGA) or an ApplicationSpecific Integrated Circuit (ASIC). A hardware component may alsoinclude programmable logic or circuitry that is temporarily configuredby software to perform certain operations. For example, a hardwarecomponent may include software executed by a general-purpose processoror other programmable processor. Once configured by such software,hardware components become specific machines (or specific components ofa machine) uniquely tailored to perform the configured functions and areno longer general-purpose processors. It will be appreciated that thedecision to implement a hardware component mechanically, in dedicatedand permanently configured circuitry, or in temporarily configuredcircuitry (e.g., configured by software) may be driven by cost and timeconsiderations. Accordingly, the phrase “hardware component” (or“hardware-implemented component”) should be understood to encompass atangible entity, be that an entity that is physically constructed,permanently configured (e.g., hardwired), or temporarily configured(e.g., programmed) to operate in a certain manner or to perform certainoperations described herein. Considering embodiments in which hardwarecomponents are temporarily configured (e.g., programmed), each of thehardware components need not be configured or instantiated at any oneinstance in time. For example, where a hardware component comprises ageneral-purpose processor configured by software to become aspecial-purpose processor, the general-purpose processor may beconfigured as respectively different special-purpose processors (e.g.,comprising different hardware components) at different times. Softwareaccordingly configures a particular processor or processors, forexample, to constitute a particular hardware component at one instanceof time and to constitute a different hardware component at a differentinstance of time. Hardware components can provide information to, andreceive information from, other hardware components. Accordingly, thedescribed hardware components may be regarded as being communicativelycoupled. Where multiple hardware components exist contemporaneously,communications may be achieved through signal transmission (e.g., overappropriate circuits and buses) between or among two or more of thehardware components. In embodiments in which multiple hardwarecomponents are configured or instantiated at different times,communications between such hardware components may be achieved, forexample, through the storage and retrieval of information in memorystructures to which the multiple hardware components have access. Forexample, one hardware component may perform an operation and store theoutput of that operation in a memory device to which it iscommunicatively coupled. A further hardware component may then, at alater time, access the memory device to retrieve and process the storedoutput. Hardware components may also initiate communications with inputor output devices, and can operate on a resource (e.g., a collection ofinformation). The various operations of example methods described hereinmay be performed, at least partially, by one or more processors that aretemporarily configured (e.g., by software) or permanently configured toperform the relevant operations. Whether temporarily or permanentlyconfigured, such processors may constitute processor-implementedcomponents that operate to perform one or more operations or functionsdescribed herein. As used herein, “processor-implemented component”refers to a hardware component implemented using one or more processors.Similarly, the methods described herein may be at least partiallyprocessor-implemented, with a particular processor or processors beingan example of hardware. For example, at least some of the operations ofa method may be performed by one or more processors orprocessor-implemented components. Moreover, the one or more processorsmay also operate to support performance of the relevant operations in a“cloud computing” environment or as a “software as a service” (SaaS).For example, at least some of the operations may be performed by a groupof computers (as examples of machines including processors), with theseoperations being accessible via a network (e.g., the Internet) and viaone or more appropriate interfaces (e.g., an Application ProgramInterface (API)). The performance of certain of the operations may bedistributed among the processors, not only residing within a singlemachine, but deployed across a number of machines. In some exampleembodiments, the processors or processor-implemented components may belocated in a single geographic location (e.g., within a homeenvironment, an office environment, or a server farm). In other exampleembodiments, the processors or processor-implemented components may bedistributed across a number of geographic locations.

“PROCESSOR” in this context refers to any circuit or virtual circuit (aphysical circuit emulated by logic executing on an actual processor)that manipulates data values according to control signals (e.g.,“commands”, “op codes”, “machine code”, etc.) and which producescorresponding output signals that are applied to operate a machine. Aprocessor may, for example, be a Central Processing Unit (CPU), aReduced Instruction Set Computing (RISC) processor, a ComplexInstruction Set Computing (CISC) processor, a Graphics Processing Unit(GPU), a Digital Signal Processor (DSP), an Application SpecificIntegrated Circuit (ASIC), a Radio-Frequency Integrated Circuit (RFIC)or any combination thereof. A processor may further be a multi-coreprocessor having two or more independent processors (sometimes referredto as “cores”) that may execute instructions contemporaneously.

“TIMESTAMP” in this context refers to a sequence of characters orencoded information identifying when a certain event occurred, forexample giving date and time of day, sometimes accurate to a smallfraction of a second.

What is claimed is:
 1. A method comprising: accessing, by one or moreprocessors of a mobile device, image data; convolving using a firstconvolution layer, a first plurality of kernels of a deep convolutionalneural network (DCNN) with the image data to generate first intermediatedata, wherein the first plurality of kernels comprise pixel heights lessthan a pixel height of the image data and pixel widths less than a pixelwidth of the image data, and wherein the first plurality of kernels areassociated with a first plurality of tags; convolving using a secondconvolution layer, a second plurality of kernels of the DCNN with thefirst intermediate data to generate second intermediate data, whereinthe second plurality of kernels are associated with a second pluralityof tags; determining, with a fully connected layer of the DCNN, a thirdintermediate data from the second intermediate data, the thirdintermediate data associated with a third plurality of tags; andassigning one or more tags of the third plurality of tags to portions ofthe image data based on a comparison of a plurality of output valueswith threshold values, wherein each tag of the first plurality of tags,the second plurality of tags, and the third plurality of tags indicatesa prediction score map for an image category.
 2. The method of claim 1wherein the method further comprises: capturing, using an image sensorof the mobile device, the image data.
 3. The method of claim 1 furthercomprising: converting the fully connected layer of the DCNN to a thirdconvolution layer.
 4. The method of claim 3 wherein the thirdconvolution layer comprises: a third plurality of kernels of the DCNNand wherein the third plurality of kernels are associated with the thirdplurality of tags.
 5. The method of claim 1 wherein the firstconvolution layer and the second convolution layer convolve across thepixel width of the image data and the pixel height of the image data. 6.The method of claim 1 wherein the image data is not split into multiplesub-windows.
 7. The method of claim 1 wherein the DCNN comprisesadditional convolution layers.
 8. The method of claim 1 wherein 16-bithalf precision values are used to store first intermediate data andsecond intermediate data.
 9. The method of claim 1 further comprising:receiving, by one or more processors of the mobile device, a pluralityof weights for the DCNN, wherein the plurality of weights are floatingpoint weights compressed to weight indices.
 10. The method of claim 9further comprising: in response to convolving processing, using thefirst convolution layer, decompressing a first set of weight indices.11. The method of claim 1 wherein convolving, using the firstconvolution layer, further comprises: using a corresponding first set offloating point weights as decompressed from weight indices.
 12. Themethod of claim 1 wherein the DCNN comprises only convolutional layersand a fully connected layer.
 13. The method of claim 1 wherein the DCNNuses a plurality of weights comprising 16 bit weights, 32 bit weights,or 64 bit weights.
 14. The method of claim 1 wherein the DCNN refrainsfrom using a max-pooling layer.
 15. The method of claim 1 furthercomprising: storing the image data and indications of the one or moreassigned tags to portions of the image data.
 16. A mobile device forimage tagging comprising: a memory; an image sensor coupled to thememory; and one or more processors coupled to the memory and configuredto: access, by one or more processors of a mobile device, image data;convolve using a first convolution layer, a first plurality of kernelsof a deep convolutional neural network (DCNN) with the image data togenerate first intermediate data, wherein the first plurality of kernelscomprise pixel heights less than a pixel height of the image data andpixel widths less than a pixel width of the image data, and wherein thefirst plurality of kernels are associated with a first plurality oftags; convolve using a second convolution layer, a second plurality ofkernels of the DCNN with the first intermediate data to generate secondintermediate data, wherein the second plurality of kernels areassociated with a second plurality of tags; determine, with a fullyconnected layer of the DCNN, a third intermediate data from the secondintermediate data, the third intermediate data associated with a thirdplurality of tags; and assign one or more tags of the third plurality oftags to portions of the image data based on a comparison of a pluralityof output values with threshold values, wherein each tag of the firstplurality of tags, the second plurality of tags, and the third pluralityof tags indicates a prediction score map for an image category.
 17. Anon-transitory storage medium comprising instructions that, whenexecuted by one or more processors of a mobile device, cause the mobiledevice to perform operations for local image tagging, the operationscomprising: accessing, by one or more processors of a mobile device,image data; convolving using a first convolution layer, a firstplurality of kernels of a deep convolutional neural network (DCNN) withthe image data to generate first intermediate data, wherein the firstplurality of kernels comprise pixel heights less than a pixel height ofthe image data and pixel widths less than a pixel width of the imagedata, and wherein the first plurality of kernels are associated with afirst plurality of tags; convolving using a second convolution layer, asecond plurality of kernels of the DCNN with the first intermediate datato generate second intermediate data, wherein the second plurality ofkernels are associated with a second plurality of tags; determining,with a fully connected layer of the DCNN, a third intermediate data fromthe second intermediate data, the third intermediate data associatedwith a third plurality of tags; and assigning one or more tags of thethird plurality of tags to portions of the image data based on acomparison of a plurality of output values with threshold values,wherein each tag of the first plurality of tags, the second plurality oftags, and the third plurality of tags indicates a prediction score mapfor an image category.